Genome Evolution in Plants: Complex Thalloid Liverworts (Marchantiopsida)

Abstract Why do some genomes stay small and simple, while others become huge, and why are some genomes more stable? In contrast to angiosperms and gymnosperms, liverworts are characterized by small genomes with low variation in size and conserved chromosome numbers. We quantified genome evolution among five Marchantiophyta (liverworts), measuring gene characteristics, transposable element (TE) landscape, collinearity, and sex chromosome evolution that might explain the small size and limited variability of liverwort genomes. No genome duplications were identified among examined liverworts and levels of duplicated genes are low. Among the liverwort species, Lunularia cruciata stands out with a genome size almost twice that of the other liverwort species investigated here, and most of this increased size is due to bursts of Ty3/Gypsy retrotransposons. Intrachromosomal rearrangements between examined liverworts are abundant but occur at a slower rate compared with angiosperms. Most genes on L. cruciata scaffolds have their orthologs on homologous Marchantia polymorpha chromosomes, indicating a low degree of rearrangements between chromosomes. Still, translocation of a fragment of the female U chromosome to an autosome was predicted from our data, which might explain the uniquely small U chromosome in L. cruciata. Low levels of gene duplication, TE activity, and chromosomal rearrangements might contribute to the apparent slow rate of morphological evolution in liverworts.


Introduction
Some genomes stay small and simple, while others evolve to become huge (Pellicer and Leitch 2020). Likewise, some genomes show higher levels of stability than others, which might be related to the complexity and diversification of the organisms. High rates of genome size evolution, but not the actual size of the genome, correlate with high rates of speciation in angiosperms, consistent with previous predictions that genome size variability is linked to success in flowering plants (Puttick et al. 2015).
Angiosperm nuclear genomes vary 1,000-fold in DNA content, from around 60 Mb (Genlisea tuberosa) to 149,000 Mb (Paris japonica [Pellicer et al. 2018]). Whole-genome duplications (WGDs) can account for such variation but also accumulation of transposable elements (TEs; reviewed in Bennetzen and Wang [2014]). In addition, genome size dynamics is influenced by differences in rates of DNA removal, epigenetic silencing of TEs, natural selection, and drift (Tenaillon et al. 2010). Small angiosperm genomes appear to be reduced in size due to effective genome reduction, while genomes of species in the lycophyte genus Selaginella appear to be small due to a low rate of genome size increase (Baniaga et al. 2016). Gymnosperm genomes are on average large with low diversity between lineages (16-fold [Pellicer et al. 2018]). These large sizes are likely due to a slow constant accumulation of TEs together with slow removal of DNA (Bennetzen and Wang 2014). Genomes may also evolve through structural changes as a result of translocations, inversions, duplications, and deletions that shuffle the order and orientation of genomic sequences (Bennetzen and Wang 2014). These are often associated with ectopic recombination at repetitive elements and may act as a source of genetic novelty.
Bryophytes occupy a pivotal position in the phylogeny of land plants as the sister group to all other extant land plants. The diversification of bryophytes has been hypothesized to be slow due to an assumed slow rate of morphological change (Crum 1972). It has been speculated that this is accompanied by a slow rate of evolution at the molecular level (Stenøien 2008). Linde et al. (2021) showed that the molecular evolutionary rate, in terms of synonymous substitution rates, was slower compared with angiosperms but not as low as in gymnosperms.
A slow rate of diversification in bryophytes could also be due to a slow rate of structural changes in their genomes. In this study, we focus on liverworts (Marchantiophyta), one of the three major clades of the bryophytes, which are characterized by low variation in genome size and chromosome numbers (Berrie 1960;Bainard et al. 2013), by a lack of known ancient WGD events, and by smaller gene families (Bowman et al. 2017; One Thousand Plant Transcriptomes Initiative 2019). There are currently limited genomic data in bryophytes, with only a few genomes of phylogenetically divergent taxa available; thus, knowledge on the patterns and rates of structural changes is limited (Szövényi et al. 2021).
The aim of this study is to examine and quantify different aspects of structural genome evolution among liverworts. We study taxa representing contrasting divergence times to compare the changes of TE landscape complexity, sequence collinearity, and sex chromosome evolution to identify the processes underlying these structural changes.

Low Gene Duplication Rate in Liverworts
It has been estimated that, on average, 65% of the annotated genes in plant genomes have a duplicate copy (Panchy et al. 2016). WGD and tandem duplication account for the majority of plant duplicates in angiosperms, but TE-based mechanisms and retrotransposition also generate a significant number of duplicates (Panchy et al. 2016). Using the same criteria as Panchy et al. (2016), we found that the estimates of duplicated genes among liverworts are similar and in lower range of those of the 45-84% observed in other land plants (table 1; supplementary fig. S1, Supplementary Material online [Panchy et al. 2016]).
In agreement with the low duplication rate, liverwort genomes contain a lower number of genes compared with most other clades of plants (table 1; supplementary fig. S1, Supplementary Material online). The liverwort estimates are within the range of those of the hornworts Anthoceros angustus and Anthorceros agrestis but lower than that of the lycophyte Selaginella moellendorffii. This is not surprising considering the lack of reported ancient WGDs in liverworts and hornworts (Bowman et al. 2017;Lang et al. 2018;Zhang et al. 2020). The average gene size of liverworts is similar to those of other plant clades (table 1). Despite the similar number of genes in Lunularia cruciata and Marchantia polymorpha (∼17,000 vs. ∼19,000; table 1), the estimated genome size of L. cruciata is more than twice that of M. polymorpha (table 1).
The Larger Size of the L. cruciata Genome Is Largely Explained by TE Content The estimated repeat contents of the liverwort assemblies range from 25% for M. polymorpha ruderalis to 63% for L. cruciata (table 1). The relatively high repeat content of L. cruciata is in agreement with its almost doubled genome size when compared with M. polymorpha. To examine the reasons behind this higher repeat content, the landscape of TEs in the liverwort genomes was characterized. Indeed, most of the genome inflation of L. cruciata is explained by expansion of TEs ( fig. 2A, supplementary table S1, Supplementary Material online). The vast majority of this TE expansion emanates from a proliferation of one group of long terminal repeat (LTR) retrotransposons, the Ty3/ Gypsy superfamily. The M. polymorpha ruderalis genome contains a similar amount of the two main retrotransposons superfamilies Ty1/Copia and Ty3/Gypsy (fig. 1A [Montgomery et al. 2020]), while L. cruciata contains a large excess of Ty3/Gypsy elements (40% of all repeats). The relatively large proportion of repeats in L. cruciata denoted "unknown" may contain liverwort-specific TEs or reflect problems in the assembly of repetitive sequences using short read data.
Bursts of TE transposition can be identified by examining the divergence of individual TEs from the TE family consensus sequence. In L. cruciata, more than one peak is obvious in the distribution of divergence time ( fig. 1B), suggesting more than one burst of proliferation of Ty3/Gypsy elements. Similar but much less pronounced bursts were also evident in the other species that seem to have experienced a slower accumulation of TEs. A more detailed analysis of specific families of Ty3/Gypsy elements in L. cruciata reveals a complex pattern of proliferation of specific elements at several time points ( fig. 2). This pattern supports the idea that specific elements may mutate to evade hosts' silencing and proliferate for some time before host silencing reoccurs (Lisch and Slotkin 2011). A comparative analysis of Ty3/Gypsy and Ty1/Copia elements in L. cruciata and M. polymorpha ruderalis revealed that bursts seem to have occurred at similar time points in the two genomes, but the distributions of divergence are skewed toward lower divergence for L. cruciata, especially for Ty3/Gypsy elements (supplementary fig. S2, Supplementary Material online). This suggests either an increased transposon activity in the L. cruciata lineage or an increased efficiency of excision of transposed elements in the lineage leading to M. polymorpha. More data are needed to determine which of these alternatives are more important, but the latter hypothesis is supported by data on genome size evolution in liverworts (Bainard et al. 2013). The estimated number of full-length LTR retrotransposons, potentially functional, was 2,827 for the male genome of L. cruciata-considerably more than the 222 identified full-length LTRs in M. polymorpha ruderalis chromosome assembly. This difference supports more recent transposon insertions in L. cruciata.

Slow Rate of Chromosomal Rearrangements in Liverworts
Marchantia polymorpha ruderalis and L. cruciata diverged some 220 Ma before the split of the last common ancestor of Arabidopsis thaliana and the basal angiosperm Amborella trichopoda (181 Ma [Kumar et al. 2017]). A comparison of the decrease of collinearity with divergence Table 1 Genomic Features of the Included Species and Subspecies   Thus, our data suggest a multitude of chromosomal rearrangements, including inversions, since the separation of the lineages leading to L. cruciata and M. polymorpha. It should be noted that the frequency of translocations and other larger-scale-rearrangements (beyond the limited size of the L. cruciata scaffolds) is underestimated. Based on available data, it is not possible to estimate and compare rates within the two liverwort lineages. Thus, we cannot conclude if the rate of chromosomal rearrangements is associated with the higher TE activity in L. cruciata (Bennetzen and Wang 2014).

Sex Chromosome Evolution
Lunularia cruciata is dioicous, that is having separate male and female haploid individuals, with the presence of U or V sex chromosomes, respectively, and this is the presumed ancestral condition in liverworts (Berrie 1960;Bowman 2016;Iwasaki et al. 2021). In nonrecombining regions of the sex chromosomes, U and V chromosomes possess divergent gametolog pairs descended from a common ancestral autosomal gene. Previously, L. cruciata orthologs of M. polymorpha gametologs were identified from two transcriptomes representing female and male individuals (Iwasaki et al. 2021;supplementary   Although the structure of L. cruciata sex chromosomes is presently unresolved, genes residing on the U or V chromosomes should be present only in the female or male genomes, respectively. We first confirmed that in each case the gametolog orthologs of the female and male genomes matched those of the female and male transcriptomes, respectively ( fig. 5; supplementary fig. S4, Supplementary Material online; supplementary table S3, Supplementary Material online). We next examined whether the gametolog orthologs were unique to the female or male genomes, suggesting that the sequences reside on the U or V chromosome, or whether they were shared, which would suggest that sequences are autosomal (or pseudoautosomal). In each case where gametolog ortholog pairs were identified with distinct sex-specific sequences, they are located on genomic scaffolds that are unique to either the female or male. In contrast, in every case where the male and female gametolog orthologs have similar or identical sequences, the genomic scaffolds on which these genes reside are found in both the female and male genomes. Lastly, at least some orthologs of phylogenetically conserved M. polymorpha U-and V-specific genes may also be U-or V-linked in L. cruciata as they reside on scaffolds containing sequences found in genomic assemblies derived from only one sex ( fig. 5; supplementary table S4, Supplementary Material online). This is in contrast to scaffolds presumed to be autosomal where corresponding scaffolds in the opposite sex are typically >99% identical.

Discussion
It has been hypothesized that bryophytes, being structurally simple organisms with a predominantly haploid life cycle, have experienced slow molecular evolution, similar to their seemingly slow morphological diversification. Linde et al. (2021) showed that the silent site substitution rates of bryophytes are not lower when compared with vascular plants as a group. Bryophyte rates were lower than for angiosperms but higher than gymnosperms; thus, low substitution rates cannot alone explain a difference in morphologic evolution.
Bryophytes are characterized by small genomes. The mean (median in parentheses) of bryophyte genome sizes  [Pellicer and Leitch 2020]). This variation is very limited compared with the angiosperm lineage, where genome sizes vary 2,440-fold with a mean (median in parentheses) of 5,020 Mb (1,663 Mb), and although some of the largest genomes are observed within the angiosperms, their genome sizes are skewed toward smaller genomes (Pellicer et al. 2018).

Low Levels of Gene Duplication in Marchantiophyta
A low degree of gene duplication has been reported for M. polymorpha ruderalis (Bowman et al. 2017). Our data now suggest that the percentage of duplicated genes and tandem duplicated is similar between all the studied Marchantiopsida liverworts despite a large phylogenetic distance in the case of L. cruciata and M. polymorpha ruderalis. The low number of duplicated genes in Marchantiophyta liverworts can to a large extent be explained by a lack of ancient WGD events (Bowman et al. 2017). A considerable number of paralogs were all the same detected in liverworts, probably to a large extent resulting from tandem or segmental duplications ( fig. 1). Still, tandem duplications in Marchantiophyta are at the lower range compared with angiosperms (Bowman et al. 2017). Gene number estimates within Marchantiophyta are between those of the hornwort A. angustus and A. agrestis and similar to those of the charophyte alga Klebsormidium nitens. The lower gene number estimate in A. angustus is consistent with a reported high rate of gene family loss in this species (Zhang et al. 2020). As gene duplication may increase evolutionary potential (Ohno 2013), the low gene numbers of Marchantiophyta

Bursts of Transposable Element Accumulation in L. cruciata
Despite a limited variation in genome size among representatives of Marchantiopsida, the genome of L. cruciata is approximately twice the size when compared with that of M. polymorpha. This difference is mainly due to a higher copy number of a few families of TEs in L. cruciata that are the result of several bursts of amplification. These results show that such bursts, that are common in angiosperms, also occur in liverworts. However, extrapolating from our results this kind of burst seems rare, at least among Marchantiopsida. Even though the TE content is generally lower in the studied liverworts when compared with other most plant species, a similar collection of elements was observed. The most common TE elements were Ty3/Gypsy and Ty1/Copia, as has been reported in other plants (Voytas et al. 1992;Suoniemi et al. 1998). The ratio of Ty3/Gypsy to Ty1/Copia elements varies in the studied liverwort species, with Ty3/Gypsy being the most abundant in L. cruciata, whereas the two are present in similar proportions in M. paleacea. Most of this variation can be explained by multiple bursts of Gypsy families in L. cruciata.
The larger transposon accumulation over time in Lunularia when compared with the Marchantia species is not associated with any known radiation, and the order Lunulariales is considered monotypic with a single species, L. cruciata, in contrast to the much more speciose Marchantiales (Söderström et al. 2016).

Low Rate of Chromosomal Rearrangements in Marchantiopsida
Overall, there is a much higher average microsynteny across placental mammals compared with angiosperms (Zhao and Schranz 2019). Wang et al. (2012) reported 0.6-6.7% collinear ortholog pairs in pairwise comparisons between  (Bowman et al. 2017;Montgomery et al. 2020;Iwasaki et al. 2021). Gametologs are listed between the U and V chromosomes, with phylogenetically conserved U-and V-specific genes represented by short outward pointing lines (supplementary table S4, Supplementary Material online). Gametolog pairs predicted to have been in nonrecombining regions, presumably on sex chromosomes of the common ancestors of Lunularia and Marchantia, are in dark green, whereas those predicted to have been in recombining regions (either pseudoautosomal or autosomal) in the common ancestor are in light green (Iwasaki et al. 2021). The results of phylogenetic analyses of L. cruciata orthologs of M. polymorpha gametologs (supplementary fig. S4 and table S4, Supplementary Material online) are presented for the male and female transcriptomes and genomes. Orthologs are classified as U or V if the detected ortholog is more closely related to the U or V M. polymorpha gametolog, respectively, than the L. cruciata sequences to each other. Orthologs are defined as O (outgroup) if the two M. polymorpha gametologs are more closely related to each other than the L. cruciata ortholog. If the male and female transcriptomes possess the same gametolog, it is demarcated with a "1." For the genome, if the gene sequence is specific to the female or male genome, it is demarcated in red (U or O) or blue (V or O), respectively. If the sequence is found in both female and male genomes, it is demarcated in black (A, autosomal or pseudoautosomal). In each case where a single sequence is found in the transcriptomes, the gene is predicted to be (pseudo)autosomal. Note that although many of the L. cruciata genes represented are predicted to reside on the sex chromosomes, the chromosomal order and orientation of the genes in L. cruciata is unknown.
Genome Biol. Evol. 15(3) https://doi.org/10.1093/gbe/evad014 Advance Access publication 2 February 2023 GBE monocot and eudicot species that diverged approximately 160 Ma (Kumar et al. 2017). Because different methods and thresholds have been used, those values are not directly comparable with the values obtained in the present study. Still, regardless of the settings used in our analysis, the percentage collinear gene pairs between Marchantiopsida liverwort species pairs is always higher than those obtained from angiosperm pairs. Thus, a relatively low rate of chromosomal rearrangements is suggested for species within Marchantiopsida. Although the fragmented assembly of L. cruciata precludes a direct comparison of rates of inversions versus translocations, our results are consistent with the conclusion that structural changes within plant chromosomes are far more abundant than those between chromosomes (Dvorak et al. 2018). Linde et al. (2020) found that chromosome 2 of M. polymorpha subsp. montivagans was distinctly more divergent than any other parts of the genome in pairwise comparisons with the other subspecies of M. polymorpha. A higher degree of structural rearrangements on chromosome 2 of subsp. montivagans suggested that it might have been more strongly protected against introgression than other parts of the genome. This chromosome also had a higher proportion of transposons.

Sex Chromosome Evolution in L. cruciata
Among liverworts, the karyotype of L. cruciata with the V chromosome larger than the U is unusual and in contrast to the vast majority of liverworts with dimorphic sex chromosomes in which the U is larger than the V (Lorbeer 1934;Sergio and Viana 1973). A translocation of a fragment of the U chromosome to an autosomal location could account for this unique karyotype. The pattern of L. cruciata gametolog evolution is consistent with the hypothesis that a segment composed of both previously nonrecombining regions (including three gametologs and one U-specific gene) and potentially ancestrally pseudoautosomal regions was translocated from the U to an autosome. Consistent with this hypothesis and as predicted by Muller's ratchet (Muller 1932), the corresponding V-gametologs appear to have been lost.

Conclusions
Our analyses reveal clear differences in the patterns of genome evolution between liverworts and angiosperms resulting in smaller and less variable liverwort genomes. First, although angiosperms have gone through several rounds of ancient WGDs, no ancient WGD has been identified for liverworts. Even though tandem duplications are not uncommon in liverworts, their genomes generally contain fewer genes. This difference could restrict the evolutionary potential for liverworts, as less new genetic material is available for mutation, drift, and selection to act upon. Second, the rate of structural changes in terms of both genome size and rearrangements seems low when compared with angiosperms. These differences might be the result of lower transposon activity. We did observe bursts of primarily LTR TEs in liverworts, similar to what is prevalent in angiosperms. However, such events seem to be much more rare in liverworts. As transposon activity is thought to provide additional mutations that might speed up evolution in terms of both rates of adaptation and speciation (Chénais et al. 2012;Rebollo et al. 2012), the association between a low TE activity and a low rate of evolutionary change in liverworts is not unexpected.

Genome Assembly and Annotation
Genomes of five Marchantiopsida liverwort species/subspecies were included: M. polymorpha subsp. ruderalis, M. polymorpha subsp. polymorpha, M. polymorpha subsp. montivagans, Marchantia inflexa, and L. cruciata. Assembly and annotation of M. polymorpha subsp. ruderalis are publicly available (Bowman et al. 2017;Montgomery et al. 2020;Iwasaki et al. 2021) and so is the assembly of M. inflexa (Marks et al. 2019), which was used for only repeat characterization. Sequencing, assembly, and annotation of M. polymorpha subsp. polymorpha and subsp. montivagans are described in Linde et al. (2020). The genome of M. paleacea subsp. diptera was sequenced and assembled as described in Radhakrishnan (2017) and summarized in Linde et al. (2020).
The female genome of L. cruciata is described in Linde et al. (2021). BUSCO (Simão et al. 2015), the eukaryo-te_odb9 lineage data set, and CEGMA were used to assess the completeness of the assembly, which was 87.4% (89.1%) according to BUSCO and 90.32% (95.56%) according to CEGMA, with the rates in parenthesis also including fragmented/partial hits. GenomeScope (http://qb. cshl.edu/genomescope/) was used to verify the estimates for genome size and repeat content using a kmer-based statistical approach on the raw reads giving an estimate of the genome size (581 Mb) similar to the assembly size (580 Mb) but a lower estimate of the repeat content (35%) compared with the value estimated as described below in the repeat annotation section, likely due to the failure to detect older and more divergent repeats. Annotation of L. cruciata was performed as described in Linde et al. (2020). BUSCO was used to assess the completeness of the annotations. The M. polymorpha ruderalis annotation version 3.1, which is considered to be more or less complete, had a completeness of 97.4 (97.7%). The completeness of the annotations of M. polymorpha subsp. montivagans, subsp. polymorpha, L. cruciata, and M. paleacea was 92.8% (97.8%), 86.8% (96.7%), 85.8 (95.4%), and 90.8% (96.7%), respectively (fragmented hits included in numbers within parenthesis).
The genome of a male L. cruciata was assembled using the data set SRR8246012 (Dong et al. 2019). The assembly was performed using MaSuRCA version 3.4.2 (Zimin et al. 2013), with pair end, using default settings. The genome size of male L. cruciata was estimated on the basis of size of the female genome (581 Mb) and a 519 Mb genome was assembled. BUSCO (Simão et al. 2015), together with the eukaryota data set, was used to assess the completeness of the assembly, which was 93.4% (97.7%), with the rate in parenthesis including not only the complete but also the fragmented/partial hits. Assembly statistics are given in supplementary table S2, Supplementary Material online. Both L. cruciata assemblies with annotations are available at genomevolution.org (IDs 64629, 64630, and 64631).

Sex Chromosome Phylogenetics
Predicted L. cruciata gametolog orthologs were identified from available genome and transcriptome sequence data (Dong et al. 2019; One Thousand Plant Transcriptomes Initiative 2019; Iwasaki et al. 2021). Nucleotide sequences were manually aligned as amino acid translations using Se-Al v2.0a11 (http://tree.bio.ed.ac.uk/software/seal/). Ambiguously aligned sequences were removed and alignments of nucleotides were employed in subsequent Bayesian phylogenetic analysis using MrBayes 3.2.1 . The Bayesian analysis for the nucleotide data set was run for 500,000 generations, which was sufficient for convergence of the two simultaneous runs (split frequencies < 0.05). To allow for the burn-in phase, the initial 50% of the total number of saved trees was discarded. The graphic representation of the trees was generated using the FigTree (version 1.4.0) software (http://tree.bio.ed.ac. uk/software/figtree/). Nucleotide alignments are provided in supplementary file S1, Supplementary Material online.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).